1 INTRODUCTION
In the previous voter turnout post, we provided a vignette on the how to visualise voter turnout from the 2024 South African National and Provincial Elections (NPE). The election results had a seismic shift in the political landscape, introducing coalition government at National and Provincial levels. Coalitions were introduced at scale in the local government elections in 2016 (Joubert 2016).
Interestingly, both elections had a Jacob Zuma effect
, first in 2016 with his historically low approval ratings and again in 2024 through the formation of a new political entity Mkhonto WeSizwe Party1. Another often under reported headline in the election is voter turnout. In this post focuses on the interaction between the 2022 Census Results and 2024 National and Provincial Results. The goal is to determine whether there were predictors in the 2022 Census that can explain the regional differences in voter turnout.
2 DATA COLLECTION
We rely on two primary governmental sources, Statistics South Africa (StatsSA) and the Independent Electoral Commission (IEC). The Census Dataset available from the StatsSA website. The post-enumeration and publication of results has not been without controversy. Moultrie and Dorrington (2024) found flaws in the census results with policy and planning implications, including an over estimation of the overall population, demographic inaccuracies and inconsistent sub-national spatial estimates among other issues.
Beyond these reported issues, the 2022 Census 10% Sample omits some crucial variables such Income, Water Interruptions longer than two consecutive days, Labour Outcomes and Mortality and Fertility. Since the publication of the report, there has been a flurry of activity among Academics, Media and Policy Planners to try and mitigate the above mentioned issues.
StatsSA has also retracted the `SA at a Glance: Census 2022` report lending credence to concerns raised about the Census 2022 data. The ramifications of overestimating and underestimating populations are far-reaching both for business and society. For our purposes, we need to consider the data as estimates rather than Census until the adjusted data is published.
Despite an Electoral Court an unsuccessful court case alleging irregularities (see Zondi and Steyn (2024)), the IEC data does not contain any material data collection controversy. Spatially joining the datasets presents the first challenge. The electoral results are present across voting districts, a sub-ward level area, which enable efficient voting while the 2022 Census 10% has the municipality as the lowest level of observation.
To enable a join, it is possible to aggregate the voter turnout data to municipal level and compare outcomes at the same level. However, this decision has a trade-off; nuance. South Africa is a spatially fragmented nation, intra-municipal socio-economic outcomes can vary greatly, even between contiguous neighbours. Similarly, voting behaviour can follows a similar pattern.
One approach to overcome this hurdle is dasysmetric mapping, where we effective diffuse a socio-economic outcomes across the estimated population distribution in an area. For our purposes, it is suffice to aggregate the voting data to the same level as the census sample dataset.
2.1 DATA PREPROCESSING
Since the data collection methods differ across the two datasets. It is important to consider the special nature of the Census data; it is effectively a complex survey. Zimmer (2024) will serve as an important starting point for aggregating and estimating socio-economic differences across municipalities.
see code
Ward_Turnout <- readRDS(file = "data/SA_Wards_Turn_Out_Difference_1_2024-06-17.rds")[,c("province","cat_b","municipali","ward_no","district","district_co","ward","turnout_diff","geometry")]
Household_Data <- read_dta(file = 'data/sa-census-2022-v1-stata/sa-census-2022-household-v1.dta')
Geodataset <- read_dta("data/sa-census-2022-v1-stata/sa-census-2022-geography-v1.dta")
Municipal_Turnout <- Ward_Turnout |>
mutate(turnout_diff = ifelse(is.na(turnout_diff),0,turnout_diff)) |>
group_by(province,cat_b,municipali) |>
summarise(turnout_diff = median(turnout_diff),
.groups="drop") |>
sf::st_drop_geometry()
although coordinates are longitude/latitude, st_union assumes that they are
planar